Skip to content

Database configuration and migration #42

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 38 commits into
base: develop
Choose a base branch
from
Draft

Conversation

JanJakes
Copy link
Contributor

@JanJakes JanJakes commented Apr 15, 2025

This PR extracts existing SQLite database configuration to a WP_SQLite_Configurator and implements WP_SQLite_Information_Schema_Reconstructor with the ability to backfill missing table data in information schema.

How it works:

  1. The configurator checks the SQLite driver version against a version stored in the database, and if the driver is ahead, it will execute the database (re)configuration (see WP_SQLite_Configurator::ensure_database_configured()), while acquiring an EXCLUSIVE lock to prevent race conditions (multiple requests triggering the same migration).
  2. When the configurator runs, it ensures that any internal tables are created (global variables, information schema), and then it proceeds to the WP_SQLite_Information_Schema_Reconstructor.
  3. The reconstructor then compares existing SQLite tables against tables recorded in the information schema, backfills those that are missing, and removes those that no longer exist.
  4. To backfill a table when in WordPress, and it is a WordPress table, the definition is taken from wp_get_db_schema().
  5. To backfill a non-WordPress table, the information schema data is derived from _mysql_data_types_cache, if it exists, otherwise directly from the SQLite schema using SQLite column affinity rules.

@JanJakes JanJakes force-pushed the driver-migration branch 2 times, most recently from 6148a0b to 2250ed3 Compare April 15, 2025 15:13
@JanJakes JanJakes force-pushed the driver-migration branch 4 times, most recently from 7d2c0c1 to 6e94ae8 Compare April 17, 2025 13:42
@JanJakes JanJakes requested a review from adamziel April 17, 2025 14:51
@adamziel
Copy link
Contributor

adamziel commented Apr 17, 2025

Great start! And also a good idea to use a cascade of data sources for backfilling. There's some more way to go, though — I've commented on a bunch of potential issues

Comment on lines 611 to 614
private function escape_mysql_string_literal( string $literal ): string {
// See: https://www.php.net/manual/en/mysqli.real-escape-string.php
return "'" . addcslashes( $literal, "\0\n\r'\"\Z" ) . "'";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmsnell can you poke any holes in this? We assume everything is UTF-8, and if it isn't then GIGO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great to get more eyes on this. The context is:

  1. We know what the DEFAULT 'some-value' is in SQLite.
  2. We want to convert the 'some-value' part to MySQL for a DEFAULT 'some-value' in an exported MySQL CREATE TABLE statement.
  3. What characters that come in raw from SQLite do we need to escape?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 or we could bale out of the escaping part of the problem:

CREATE TABLE  `my_table` (
  `id` INT NOT NULL,
  `name` VARCHAR(45) DEFAULT (FROM_BASE64('aGV5IPCfpJQ='))
) ENGINE = InnoDB;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel Oh, that's an elegant solution! But here's the catch — https://dev.mysql.com/doc/refman/8.0/en/data-type-defaults.html — it only works for MySQL >= 8.0.13 😞

So, I was digging deeper, and it seems that PDO\SQLite always uses UTF-8. SQLite C API's can also work with UTF-16, but those are not used by PDO.

So generally speaking, escaping ASCII control characters in this way should be fine, we only need to make sure we list them all. It actually seems that mysqli.real-escape-string.php lists less of them than MySQL docs, so I'll need to verify that.

The only issue could be SQLite's GIGO policy — the fact that they don't enforce UTF-8 validity. We could strip off all invalid UTF-8 sequences, here's a neat trick:

htmlspecialchars_decode(htmlspecialchars($s, ENT_NOQUOTES | ENT_IGNORE, 'UTF-8'), ENT_NOQUOTES);

But in that case... what other values should we escape as well? Seems like SQLite's GIGO policy applies to the database schema as well, so an invalid UTF-8 sequence in a table or a column name could cause troubles too. Maybe that's beyond the scope here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything wrong with embracing GIGO on our end. We just have to make sure we're working with a utf-8 connection for this reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, WordPress requires MySQL >= 8.0. We could be a little bit stricter and require 8.0.13 here if that makes it easier. We'll still have to solve the escaping in a medium term but maybe we don't have to block this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel

I don't see anything wrong with embracing GIGO on our end. We just have to make sure we're working with a utf-8 connection for this reason.

In PHP, the SQLite connection will always be UTF-8. However, if someone saves invalid UTF-8, and we later fetch it and try to manipulate it (e.g., escaping it), that could probably cause some issues, but that's the SQLite GIGO.

Also, WordPress requires MySQL >= 8.0.

Unfortunately, it's 5.5.5+ and 5 has still around 30% of the usage: https://wordpress.org/about/stats/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it's 5.5.5+ and 5 has still around 30% of the usage: wordpress.org/about/stats

You're a treasure. Thank you for checking this and not just blindly accepting what I said 🙇

Copy link
Contributor

@adamziel adamziel Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In PHP, the SQLite connection will always be UTF-8. However, if someone saves invalid UTF-8, and we later fetch it and try to manipulate it (e.g., escaping it), that could probably cause some issues, but that's the SQLite GIGO.

That's our backend and we can't do any better than that. Let's just embrace it. UTF-8 is a fine constraint to have and I don't think WordPress is actually compatible with any other encoding, other than by accident. We could even refuse to execute any non-utf8 SET NAMES and CREATE TABLE queries which would cleanly resolve the most difficult encoding issues. Even better, a hypothetical developers who uses latin2 would see the error, get annoyed, create an issue, and teach us about their use-case.

@adamziel
Copy link
Contributor

I left some more notes. We're getting close here. Thank you for persevering @JanJakes! This is the last stretch before we can finally enjoy the new parser everywhere ❤️ I know I'm going deep on a piece of code we only need for a one-off migration, but a) it still matters and b) I think we'll reuse it later on for migrating between the database backends.

@JanJakes JanJakes force-pushed the driver-migration branch 3 times, most recently from 35199fa to 9bd0825 Compare April 29, 2025 14:16
@@ -1234,14 +1209,14 @@ public function testColumnWithOnUpdate() {
'name' => '_wp_sqlite__tmp_table_created_at_on_update',
'tbl_name' => '_tmp_table',
'rootpage' => '0',
'sql' => "CREATE TRIGGER \"_wp_sqlite__tmp_table_created_at_on_update\"\n\t\t\tAFTER UPDATE ON \"_tmp_table\"\n\t\t\tFOR EACH ROW\n\t\t\tBEGIN\n\t\t\t UPDATE \"_tmp_table\" SET \"created_at\" = CURRENT_TIMESTAMP WHERE rowid = NEW.rowid;\n\t\t\tEND",
'sql' => "CREATE TRIGGER `_wp_sqlite__tmp_table_created_at_on_update`\n\t\t\t\tAFTER UPDATE ON `_tmp_table`\n\t\t\t\tFOR EACH ROW\n\t\t\t\tBEGIN\n\t\t\t\t UPDATE `_tmp_table` SET `created_at` = CURRENT_TIMESTAMP WHERE rowid = NEW.rowid;\n\t\t\t\tEND",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit – how about heredoc or nowdoc to make it a bit more readable for humans?

@@ -1234,7 +1233,10 @@ private function execute_create_table_statement( WP_Parser_Node $node ): void {
if ( $subnode->has_child_node( 'ifNotExists' ) ) {
$tables_table = $this->information_schema_builder->get_table_name( $table_is_temporary, 'tables' );
$table_exists = $this->execute_sqlite_query(
"SELECT 1 FROM $tables_table WHERE table_schema = ? AND table_name = ?",
sprintf(
Copy link
Contributor

@adamziel adamziel Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the purposes of this PR it's lovely. Thank you for all the thorough escaping!

Future-wise, we should start a separate discussion about query formatting. Having two formatting systems is not ideal, even if only because the sprintf() caller may accidentally pull an extra ? into the query. CC @dmsnell

That being said, I wouldn't worry about that too much here. We have to stop somewhere and that seems like a reasonable boundary.

* wrapper that leaks some of the PDO APIs (returns PDOStatement values, etc.).
* In the future, we may abstract it away from PDO and support SQLite3 as well.
*/
class WP_SQLite_Connection {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about extends \PDO as a way of enforcing the interface? Then, we can preface this with if(!class_exists('\PDO')) { class \PDO {} } to prevent fatal errors on PDO-less installations.

Copy link
Contributor

@adamziel adamziel Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder about clarity. Since this is SQLite-specific, in the future we could have a separate WP_MySQL_Over_SQLite_Connection class that will expect a different SQL dialect and would expose MySQL-specific quote and quote_identifier implementation. Nothing actionable here, I'm just thinking out loud.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel My thinking here was making an "internal" wrapper of PDO\SQLite or SQLite3 (or both) for connection to the SQLite database. The part that will need to expose a unified API (e.g., PDO-like), is the current WP_SQLite_Driver, I think.

But I see making this adhere to the PDO API may make be useful as well, especially to avoid reinventing the APIs.

As for the clarity of the APIs, maybe we could use the DSN prefix as PDO does? Something like:

$connection = new WP_PDO('mysql:host=localhost;dbname=wordpress'); // MySQL
$connection = new WP_PDO('sqlite:dbname=wordpress'); // SQLite -- this is the one we're discussing now
$connection = new WP_PDO('mysql-on-sqlite:dbname=wordpress'); // MySQL on SQLite -- this is the new driver
$connection = new WP_PDO('pgsql:host=localhost;dbname=wordpress'); // Postgres
$connection = new WP_PDO('pgsql-on-sqlite:dbname=wordpress'); // Postgres on SQLite

Copy link
Contributor

@adamziel adamziel Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, it felt wrong, but the more I think about it the more I like it. It's the most connection interface we can possibly offer. Really nice idea. If I'm to nitpick, maybe I'd do WP_PDO::connect() instead of new WP_PDO just to return the correct class instance right away and save on WP_PDO proxy where each method just calls $this->actual_driver->method()

@@ -1353,7 +1350,7 @@ private function get_column_data_types( WP_Parser_Node $node ): array {
) {
$type = 'mediumtext';
} else {
throw new \RuntimeException( 'Unknown data type: ' . $token->value );
throw new \RuntimeException( 'Unknown data type: ' . $token->get_value() );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up PR idea: use a SQL-specific exception class.

@adamziel
Copy link
Contributor

This is looking really good. It seems like the only blocking thing is string escaping in the DEFAULT part of CREATE TABLE and there may be an easy way out of that problem. Is it possible we're almost there?! 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants